Coefficient tree regression: fast, accurate and interpretable predictive modeling
نویسندگان
چکیده
The proliferation of data collection technologies often results in large sets with many observations and variables. In practice, highly relevant engineered features are groups predictors that share a common regression coefficient (i.e., the group affect response only via their collective sum), where unknown advance must be discovered from data. We propose an algorithm called tree (CTR) to discover structure fit resulting model. this regard CTR is automated way engineering new features, each which sum within group. can used when number variables larger than, or smaller observations. Creating similar manner improves predictive modeling, especially domains relationships between not known priori. borrows computational strategies both linear (fast model updating adding/modifying feature model) trees partitioning form split groups) achieve outstanding performance. Finding represent hidden ontology) impact also has major interpretability advantages, we demonstrate real example predicting political affiliations television viewing habits. numerical comparisons over variety examples, expense performance far superior existing methods create as predictors. Moreover, overall comparable slightly better than regular lasso method, include reference benchmark for comparison even though it non-group-based, addition having substantial interpretive advantages lasso.
منابع مشابه
Learning Accurate, Compact, and Interpretable Tree Annotation
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as approp...
متن کاملFast Predictive Simple Geodesic Regression
Deformable image registration and regression are important tasks in medical image analysis. However, they are computationally expensive, especially when analyzing large-scale datasets that contain thousands of images. Hence, cluster computing is typically used, making the approaches dependent on such computational infrastructure. Even larger computational resources are required as study sizes i...
متن کاملThe Difference Between Predictive Modeling and Regression
Predictive modeling includes regression, both logistic and linear, depending upon the type of outcome variable. However, as the datasets are generally too large for a p-value to have meaning, predictive modeling uses other measures of model fit. Generally, too, there are enough observations so that the data can be partitioned into two or more datasets. The first subset is used to define (or tra...
متن کاملAccurate and efficient processor performance prediction via regression tree based modeling
Computer architects usually evaluate new designs using cycle-accurate processor simulation. This approach provides a detailed insight into processor performance, power consumption and complexity. However, only configurations in a subspace can be simulated in practice due to long simulation time and limited resource, leading to suboptimal conclusions which might not be applied to a larger design...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2021
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-021-06091-7